-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[BOLT] Match functions with pseudo probes #100446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/shawbyoung/spr/main.bolt-match-functions-with-pseudo-probes
Are you sure you want to change the base?
[BOLT] Match functions with pseudo probes #100446
Conversation
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
…BF, determine best matching binary inline subtrees, find corresponding binary functions, attach the profile to binary function without a profile and most probes Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Created using spr 1.3.4
Match inline trees first between profile and the binary: by GUID,
checksum, parent, and inline site for inlined functions. Map profile
probes to binary probes via matched inline tree nodes. Each binary probe
has an associated binary basic block. If all probes from one profile
basic block map to the same binary basic block, it’s an exact match,
otherwise the block is determined by majority vote and reported as loose
match.
Pseudo probe matching happens between exact hash matching and call/loose
matching.
Introduce ProbeMatchSpec - a mechanism to match probes belonging to
another binary function. For example, given functions foo and bar:
```
void foo() {
bar();
}
```
profiled binary: bar is not inlined => have top-level function bar
new binary where the profile is applied to: bar is inlined into foo.
Currently, BOLT does 1:1 matching between profile functions and binary
functions based on the name. #100446 will extend this to N:M where
multiple profiles can be matched to one binary function (as in the
example above where binary function foo would use profiles for foo and
bar), and one profile can be matched to multiple binary functions (e.g.
if bar was inlined into multiple functions).
In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile
(existing name-based matching).
Test Plan: Added match-blocks-with-pseudo-probes.test
Performance test:
- Setup:
- Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO
(#79942)
- BOLT fresh: BOLTed Clang using fresh profile,
- BOLT stale (hash): BOLTed Clang using stale profile (collected on
Clang 10K commits back), `-infer-stale-profile` (hash+call block
matching)
- BOLT stale (+probe): BOLTed Clang using stale profile,
`-infer-stale-profile` with `-stale-matching-with-pseudo-probes`
(hash+call+pseudo probe block matching)
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- BOLT profiles are collected on Clang compiling large preprocessed
C++ file.
- Benchmark: building Clang (average of 5 runs), see driver in
aaupov/llvm-devmtg-2022
- Results, wall time, lower is better:
- Baseline no-BOLT: 429.52 +- 2.61s,
- BOLT stale (hash): 413.21 +- 2.19s,
- BOLT stale (+probe): 409.69 +- 1.41s,
- BOLT fresh: 384.50 +- 1.80s.
---------
Co-authored-by: Amir Ayupov <aaupov@fb.com>
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
Pseudo probe matching (#100446) needs callee information for call probes. Embed call probe information (probe id, inline tree node, indirect flag) into CallSiteInfo. As a consequence: - Remove call probes from PseudoProbeInfo to avoid duplication, making it only contain block probes. - Probe grouping across inline tree nodes becomes more potent + allows to unambiguously elide block id 1 (common case). Block mask (blx) encoding becomes a low-ROI optimization and will be replaced by a more compact encoding leveraging simplified PseudoProbeInfo in #166680. The size increase is ~3% for an XL profile (461->475MB). Compact block probe encoding shrinks it by ~6%. Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test
Pseudo probe matching (#100446) needs callee information for call probes. Embed call probe information (probe id, inline tree node, indirect flag) into CallSiteInfo. As a consequence: - Remove call probes from PseudoProbeInfo to avoid duplication, making it only contain block probes. - Probe grouping across inline tree nodes becomes more potent + allows to unambiguously elide block id 1 (common case). Block mask (blx) encoding becomes a low-ROI optimization and will be replaced by a more compact encoding leveraging simplified PseudoProbeInfo in #166680. The size increase is ~3% for an XL profile (461->475MB). Compact block probe encoding shrinks it by ~6%. Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test Reviewers: paschalis-mpeis, ayermolo, yota9, yozhu, rafaelauler, maksfb Reviewed By: rafaelauler Pull Request: #165490
| for (auto &[ParentId, InlineSiteCSI] : ParentToCSI) { | ||
| for (auto &[InlineSite, CSI] : InlineSiteCSI) { | ||
| auto [CalleeGUID, CallSite] = InlineSite; | ||
| errs() << ParentId << "@" << CallSite << "->" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errs() --> dbgs()
| ++ProbeMatchingStats.MismatchingRootGUID; | ||
| return 0; | ||
| } | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use the extra { }?
| { | ||
| uint64_t BinaryHash = Decoder.getFuncDescForGUID(Root.Guid)->FuncHash; | ||
| uint64_t ProfileHash = FuncNode.Hash; | ||
| if (BinaryHash != ProfileHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the stale profile, right?
| ++ProbeMatchingStats.MatchedRoots; | ||
|
|
||
| uint64_t Matched = 1; | ||
| for (const auto &[Idx, ProfileNode] : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some comments for this for loop?
| return Root; | ||
| } | ||
|
|
||
| size_t YAMLProfileReader::matchInlineTreesImpl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some more high-level descriptions(like more details for what each step does and for different cases) for this function? The function is complex, and it would help future readers.
Also do you think we have good test coverage for this function? it looks it all reuses the existing tests, no test for the new feature? Though I understand it probably needs a large binary/profile, if that's the case, i'm also fine as long as you have tested locally.
| BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF, | ||
| const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx, | ||
| ArrayRef<yaml::bolt::InlineTreeNode> ProfileInlineTree, | ||
| MutableArrayRef<const MCDecodedPseudoProbeInlineTree *> Map, float Scale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks scale is unused
| } | ||
| } | ||
| if (opts::StaleMatchingWithPseudoProbes) { | ||
| SmallVector<StringRef, 0> Suffixes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 -->5
| uint64_t BinaryHash = Decoder.getFuncDescForGUID(Root.Guid)->FuncHash; | ||
| uint64_t ProfileHash = FuncNode.Hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constify.
| ProfiledFunctions.emplace(&BF); | ||
| } | ||
|
|
||
| /// Return a top-level binary inline tree node for a given \p BF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Return a top-level binary inline tree node for a given \p BF | |
| /// Return a top-level binary inline tree node for a given \p BF. |
| BC.Stats.ExactMatchedSampleCount, BC.Stats.StaleSampleCount); | ||
| BC.outs() << format( | ||
| "BOLT-INFO: inference found an exact pseudo probe match for %.2f%% of " | ||
| "BOLT-INFO: inference found pseudo probe match for %.2f%% of " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to print this info if pseudo probes are not present?
| HashFunction HashFunction, YAMLProfileReader::ProfileLookupMap &IdToYamlBF, | ||
| const BinaryFunction &BF, | ||
| const ArrayRef<YAMLProfileReader::ProbeMatchSpec> ProbeMatchSpecs); | ||
| size_t matchWeights(BinaryContext &BC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinaryContext can be obtained from BinaryFunction.
| BlendedBlockHash BinHash = BlendedHashes[MatchedBlock->Index - 1]; | ||
| LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBBIdx << ")" | ||
| << " with hash " << Twine::utohexstr(YamlBB->Hash) | ||
| LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBB.Index << ")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBB.Index << ")" | |
| LLVM_DEBUG(dbgs() << "BOLT-DEBUG: matched yaml block (bid = " << YamlBB.Index << ")" |
| } | ||
| const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId); | ||
| if (!Callee) { | ||
| LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count | |
| LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no callee for " << CSI.DestId << " " << CSI.Count |
| } | ||
| // Get callee GUID | ||
| if (Callee->InlineTree.empty()) { | ||
| LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n'); | |
| LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no inline tree for " << Callee->Name << '\n'); |
| }); | ||
|
|
||
| assert(!Root.isRoot()); | ||
| LLVM_DEBUG(dbgs() << "matchInlineTreesImpl for " << BF << "@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LLVM_DEBUG(dbgs() << "matchInlineTreesImpl for " << BF << "@" | |
| LLVM_DEBUG(dbgs() << "BOLT-DEBUG: matchInlineTreesImpl for " << BF << "@" |
|
|
||
| // Match profile function with a lead node (top-level function or inlinee) | ||
| if (Root.Guid != FuncNode.GUID) { | ||
| LLVM_DEBUG(dbgs() << "Mismatching root GUID\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LLVM_DEBUG(dbgs() << "Mismatching root GUID\n"); | |
| LLVM_DEBUG(dbgs() << "BOLT-DEBUG: mismatching root GUID\n"); |
| uint64_t BinaryHash = Decoder.getFuncDescForGUID(Root.Guid)->FuncHash; | ||
| uint64_t ProfileHash = FuncNode.Hash; | ||
| if (BinaryHash != ProfileHash) { | ||
| LLVM_DEBUG(dbgs() << "Mismatching hashes: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LLVM_DEBUG(dbgs() << "Mismatching hashes: " | |
| LLVM_DEBUG(dbgs() << "BOLT-DEBUG: mismatching hashes: " |
and more cases below.
Extended function and block pseudo probe stale matching
Follow-up to #99891.
This change brings flexible profile recovery using pseudo probes:
function inlines profile functions,
the profile has more aggressive inlining compared to the input binary.
Prerequisites
--profile-write-pseudo-probesperf2bolt flag.
--stale-matching-with-pseudo-probeswhich augments hash-basedmatching + profile inference enabled with
--infer-stale-profile(required).
-fpseudo-probe-for-profilingClang option.Performance testing
with different profiles causing different inlining decisions,
build,
Depends on:
Co-authored-by: aaupov@fb.com